Reuse rollout token counts across limit checks#1799
Open
xeophon wants to merge 1 commit into
Open
Conversation
ApprovabilityVerdict: Needs human review Changes token limit enforcement logic from single-trace properties to summing across branches, altering when limits trigger. This gates whether rollouts continue and introduces new computation paths that warrant human review. You can customize Macroscope's approvability policy. Learn more. |
af6db01 to
5f30b53
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Reduce synchronous token-limit overhead by avoiding repeated reconstruction of the same derived branch paths. The trace node graph remains the source of truth, limit precedence and soft-cap behavior are unchanged, and the commit only changes the production interception server.
What changed
Trace.nodeswhen canonical append order proves the graph is a single root-to-leaf path.Trace.branchesonce for compacted, subagent, or otherwise non-linear graphs, then reuse that view across enabled input, output, and total token checks.max_turns→ input → output → total precedence and>=boundaries.Why
Trace.nodesstores each message once, whileTrace.branchesis an uncached derived view built by finding leaves and walking each parent chain. The previous input, output, and total properties each requested that view independently. When several token caps were enabled and still below their thresholds, the same graph paths were reconstructed up to three times.The canonical linear case can safely use the existing node order without allocating a branch view. Other graph shapes continue through the established branch abstraction, but share one snapshot rather than rebuilding it for every count. This keeps arbitrary node ordering and branching semantics on the existing path.
Performance
Measurements use median
time.perf_counter()wall time with GC before each repetition; peak Python allocation is measured separately withtracemallocso tracing overhead does not affect timings.The branched peak remains unchanged because both paths hold at most one full branch snapshot at a time; the saving is reduced allocation churn and graph-walk CPU from eliminating additional snapshots. Since limit checks run synchronously from the interception session, the wall-time reduction also shortens the corresponding event-loop stall.
At higher token density, mask summation becomes a larger share of the work: the 2k-node / 10-branch case at 128 tokens per node measured 8.711 ms → 6.964 ms, saving 1.747 ms (20.1%). This is expected because the change targets graph reconstruction rather than token-mask arithmetic.
Scope
The commit contains only
verifiers/v1/interception/server.py. Benchmark scripts, focused test scaffolding, project metadata, and lockfiles are intentionally excluded.Note
Low Risk
Single-method performance refactor in limit checking; semantics are intended to match prior branch-based aggregation with lower allocation and graph-walk cost.
Overview
RolloutLimits.reachedin the v1 interception server now evaluates token caps with less repeated work, without changing limit order (max_turns→ input → output → total) or>=stop behavior.When no token limits are set, it returns immediately after the turn check. For traces whose nodes form a single linear chain (each node’s parent is the previous index), it counts from
trace.nodesdirectly instead of buildingtrace.branches. For branched or non-canonical graphs, it materializestrace.branchesonce and sumsprompt_len,completion_len, andtotal_tokensacross branches for all enabled caps—replacing separatetrace.prompt_len/completion_len/total_tokensreads that each reconstructed branch views independently.Reviewed by Cursor Bugbot for commit 5f30b53. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Reuse rollout token counts across limit checks in
RolloutLimits.reachedRolloutLimits.reachedwhen all token caps areNone, avoiding unnecessary computation.node.token_idslengths and masked token counts) rather than trace-level aggregates.trace.prompt_len, etc.) with sums acrosstrace.branches.Macroscope summarized 5f30b53.